Efficient Text Proximity Search
نویسندگان
چکیده
In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.
منابع مشابه
Efficient Engines for Keyword Proximity Search
This paper presents a formal framework for investigating keyword proximity search. Within this framework, three variants of keyword proximity search are defined. For each variant, there are algorithms for enumerating all the results in an arbitrary order, in the exact order and in an approximate order. The algorithms for enumerating in the exact order make the inevitable assumption that the siz...
متن کاملWWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data
We describe our experience in developing Web Search Systems using Oracle’s SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the ’web space’ and to provide an efficient search engine for free-text search. The Web enables global access to and maximum informa...
متن کاملEnterprise Text Processing: A Sparse Matrix Approach
Documents, both internal and related publicly available, are now considered a corporate asset. The potential to efficiently and accurately search such documents is of great significance. We demonstrate the application of sparse matrix-vector multiplication algorithms for text storage and retrieval as a means of supporting efficient and accurate text processing. As many parallel sparse matrix-ve...
متن کاملReverse Top-k Search using Random Walk with Restart
With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based applications that sea...
متن کاملUsing Transformation Techniques Towards Efficient Filtration of String Proximity Search of Biological Sequences
The problem of proximity search in biological databases is addressed. We study vector transformations and conduct the application of DFT(Discrete Fourier Transformation) and DWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA ...
متن کامل